AITopics | sa pair

Collaborating Authors

sa pair

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Action Restless Bandits with Weakly Coupled Constraints: Simultaneous Learning and Control

Fu, Jing, Moran, Bill, Niño-Mora, José

arXiv.org Artificial IntelligenceDec-4-2024

We study a system with finitely many groups of multi-action bandit processes, each of which is a Markov decision process (MDP) with finite state and action spaces and potentially different transition matrices when taking different actions. The bandit processes of the same group share the same state and action spaces and, given the same action that is taken, the same transition matrix. All the bandit processes across various groups are subject to multiple weakly coupled constraints over their state and action variables. Unlike the past studies that focused on the offline case, we consider the online case without assuming full knowledge of transition matrices and reward functions a priori and propose an effective scheme that enables simultaneous learning and control. We prove the convergence of the relevant processes in both the timeline and the number of the bandit processes, referred to as the convergence in the time and the magnitude dimensions. Moreover, we prove that the relevant processes converge exponentially fast in the magnitude dimension, leading to exponentially diminishing performance deviation between the proposed online algorithms and offline optimality. Jing Fu is with Department of Electrical and Electronic Engineering, School of Engineering, STEM College, RMIT University, Australia (e-mail: jing.fu@rmit.edu.au). Bill Moran is with Department of Electrical and Electronic Engineering, the University of Melbourne, VIC 3010, Australia (e-mail:wmoran@unimelb.edu.au).

bandit process, convergence, index policy, (14 more...)

arXiv.org Artificial Intelligence

2412.03326

Country:

Oceania > Australia > Victoria > Melbourne (0.24)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

TempLe: Learning Template of Transitions for Sample Efficient Multi-task RL

Sun, Yanchao, Yin, Xiangyu, Huang, Furong

arXiv.org Machine LearningFeb-16-2020

Transferring knowledge among various environments is important to efficiently learn multiple tasks online. Most existing methods directly use the previously learned models or previously learned optimal policies to learn new tasks. However, these methods may be inefficient when the underlying models or optimal policies are substantially different across tasks. In this paper, we propose Template Learning (TempLe), the first PAC-MDP method for multi-task reinforcement learning that could be applied to tasks with varying state/action space. TempLe generates transition dynamics templates, abstractions of the transition dynamics across tasks, to gain sample efficiency by extracting similarities between tasks even when their underlying models or optimal policies have limited commonalities. We present two algorithms for an "online" and a "finite-model" setting respectively. We prove that our proposed TempLe algorithms achieve much lower sample complexity than single-task learners or state-of-the-art multi-task methods. We show via systematically designed experiments that our TempLe method universally outperforms the state-of-the-art multi-task methods (PAC-MDP or not) in various settings and regimes.

algorithm, sa pair, transition dynamic, (11 more...)

arXiv.org Machine Learning

2002.06659

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
Europe > United Kingdom > England (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback